Data collected as part of “Decisions about chocolate are processed differently than decisions on gambles: Evidence from eye-tracking” by Betty E. Kim-Viechnicki.
This dataset contains the results of an experiment in which participants were asked to choose one of three chocolates that were displayed on a screen. The information that was provided was the brand, type of chocolate, and the price. Below is an example of the choice that was presented.
Each trial is a set of three choices given to a participant. Each choice consists of three attributes including brand, type, and price. Additional variables include the number of times the respondent fixated on any one of the attributes and whether the product was chosen.
We start by loading the dataset and necessary libraries.
library(plotly)
data <- as.data.frame(read.csv("http://goo.gl/GC1gRs"))
Effects coding was used to distinguish between levels. The following table shows the coding for five brands using four variables. Although Hershey is not a variable in the dataset, it is accounted for by setting the four brand variables to -1.
| Brand | B_Dove | B_Lindt | B_Godiva | B_Ghirardelli |
|---|---|---|---|---|
| Dove | 1 | 0 | 0 | 0 |
| Lindt | 0 | 1 | 0 | 0 |
| Godiva | 0 | 0 | 1 | 0 |
| Ghirardelli | 0 | 0 | 0 | 1 |
| Hershey | -1 | -1 | -1 | -1 |
The same method is used to code the type of chocolate.
| Brand | T_MilkNuts | T_Dark | T_DarkNuts | T_White |
|---|---|---|---|---|
| MilkNuts | 1 | 0 | 0 | 0 |
| Dark | 0 | 1 | 0 | 0 |
| DarkNuts | 0 | 0 | 1 | 0 |
| White | 0 | 0 | 0 | 1 |
| Milk | -1 | -1 | -1 | -1 |
For convenience, we create two factor variables “brand” and “type” that show the individual brand and types. By default, R will automatically code the variable levels.
data$brand <- as.factor(ifelse(data$B_Dove==1,"Dove",
ifelse(data$B_Lindt==1,"Lindt",
ifelse(data$B_Godiva==1,"Godiva",
ifelse(data$B_Ghirardelli==1,"Ghirardelli","Hershey")))))
data$type <- as.factor(ifelse(data$T_MilkNuts==1,"MilkNuts",
ifelse(data$T_Dark==1,"Dark",
ifelse(data$T_DarkNuts==1,"DarkNuts",
ifelse(data$T_White==1,"White","Milk")))))
This chunk performs basic counts on the dataset.
nrow(data) # 1050 observations
length(unique(data$Ind)) # 14 Individuals (Ind) participated
aggregate(data$Trial,list(indi=data$Ind),length) # 75 each number of trials each Ind participated in
We start by exploring the dataset using the summary() function.
summary(data)
## Ind Trial Alt B_Dove
## Min. :2401 Min. : 1 Min. :1 Min. :-1.000000
## 1st Qu.:2405 1st Qu.: 7 1st Qu.:1 1st Qu.: 0.000000
## Median :2410 Median :13 Median :2 Median : 0.000000
## Mean :2409 Mean :13 Mean :2 Mean : 0.006667
## 3rd Qu.:2413 3rd Qu.:19 3rd Qu.:3 3rd Qu.: 0.000000
## Max. :2417 Max. :25 Max. :3 Max. : 1.000000
## B_Lindt B_Godiva B_Ghirardelli
## Min. :-1.0000000 Min. :-1.00000 Min. :-1.00
## 1st Qu.: 0.0000000 1st Qu.: 0.00000 1st Qu.: 0.00
## Median : 0.0000000 Median : 0.00000 Median : 0.00
## Mean : 0.0009524 Mean : 0.02667 Mean :-0.02
## 3rd Qu.: 0.0000000 3rd Qu.: 0.00000 3rd Qu.: 0.00
## Max. : 1.0000000 Max. : 1.00000 Max. : 1.00
## T_MilkNuts T_Dark T_DarkNuts T_White
## Min. :-1.00000 Min. :-1.00000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.: 0.00000 1st Qu.: 0.00000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 0.00000 Median : 0.00000 Median : 0.0000 Median : 0.0000
## Mean : 0.01429 Mean : 0.04095 Mean : 0.0219 Mean : 0.0181
## 3rd Qu.: 0.00000 3rd Qu.: 0.00000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. : 1.00000 Max. : 1.00000 Max. : 1.0000 Max. : 1.0000
## Price Brand_Fix Type_Fix Price_Fix
## Min. :0.500 Min. : 0.000 Min. : 0.000 Min. :0.0000
## 1st Qu.:1.300 1st Qu.: 1.000 1st Qu.: 2.000 1st Qu.:0.0000
## Median :2.200 Median : 2.000 Median : 3.000 Median :1.0000
## Mean :2.211 Mean : 2.083 Mean : 3.267 Mean :0.9838
## 3rd Qu.:3.175 3rd Qu.: 3.000 3rd Qu.: 4.000 3rd Qu.:2.0000
## Max. :4.000 Max. :18.000 Max. :25.000 Max. :7.0000
## Chosen brand type
## Min. :0.0000 Dove :214 Dark :233
## 1st Qu.:0.0000 Ghirardelli:186 DarkNuts:213
## Median :0.0000 Godiva :235 Milk :190
## Mean :0.3333 Hershey :207 MilkNuts:205
## 3rd Qu.:1.0000 Lindt :208 White :209
## Max. :1.0000
The summary shows that there are no missing values and variables that should be either -1,0,or 1, are indeed so. The summary for brand and type indicate that the number of times the brands and types came up are approximately equal.
The following graphs visualize the variables and how many times each attribute was chosen. As the code for these graphs is largely the same, chunks are hidden unless there is a significant change.
cBrand <- data.frame(xtabs(Chosen ~ brand , data=data))
f <- list(
family = "Arial, sans",
size = 18,
color = "#7f7f7f"
)
x <- list(
title = "Brand",
titlefont = f
)
y <- list(
title = "Number Chosen",
titlefont = f
)
plot_ly(
x = cBrand$brand
, y = cBrand$Freq
, type = "bar"
, filename="r-docs/knitr-example"
) %>%
layout(title = "Chosen by Brand" , xaxis = x, yaxis = y)
## Warning in plot_ly(x = cBrand$brand, y = cBrand$Freq, type = "bar",
## filename = "r-docs/knitr-example"): Ignoring filename. Use plotly_POST()
## if you want to post figures to plotly.
Chosen by brand shows that Godiva gets chosen more frequently than other brands and Hershey is the least chosen. However,it may be premature to conclude that Godiva is the best brand. Without understanding the other attributes that were available in the trial, we may be mistaking correlation for causation.
Chosen by type reveals that people do not like white chocolate. Out of the 350 chocolates that were chosen throughout the entire experiment, white chocolate was only picked 26 times. The next least picked chocolate type is milk chocolate with nuts with 68 being picked. Although we can see the stark difference in white chocolate, it seems that the other four types of chocolates are chosen at roughly the same frequency.
Chosen by price shows that as price increases the number of times the chocolate gets chosen decreases.The prices that were chosen for the experiment range from $0.50 and $4. Although these prices may be typical for chocolate, it should be investigated whether this range is appropriate for a conjoint analysis. One assumption that is made for under a conjoint analysis is that the attributes levels are linear with respect to the response variable. This assumption is further tested in the analysis section.
The following visualization shows a histogram of prices for both alternatives that were chosen and not chosen. The difference in histogram clearly shows that less expensive chocolates (<$2) are chosen more frequently than more expensive chocolate. FOr chocolates that are greater than $2, there is less seperation which indicates price has a different decision impact on more expensive chocolate than on less expensive chocolate.
chosenPrices <- data$Price[which(data$Chosen==1)]
notChosenPrices <- data$Price[which(data$Chosen==0)]
x <- list(
title = "Pricing Bins",
titlefont = f
)
y <- list(
title = "Number Chosen",
titlefont = f
)
plot_ly(x=chosenPrices , opacity = 0.6 , type = "histogram" , name="Chosen") %>%
add_trace(x=notChosenPrices , name="Not Chosen") %>%
layout(barmode="overlay",title="Chosen and Not Chosen Price Histograms")
An aggregated and interactive visualization of fixations over the attributes is available here: (https://pawelb.shinyapps.io/chocolate_slider/)
cBrandFix <- data.frame(table(data$Brand_Fix,data$Chosen))
cTypeFix <- data.frame(table(data$Type_Fix,data$Chosen))
cPriceFix <- data.frame(table(data$Price_Fix,data$Chosen))
x <- list(
title = "Brand_Fix",
titlefont = f
)
y <- list(
title = "Number Chosen",
titlefont = f
)
plot_ly(
x = cBrandFix$Var1
, y = cBrandFix$Freq[which(cBrandFix$Var2==1)]
, opacity = 0.6
, type = "bar"
, name = "Brand"
) %>%
add_trace(x=cTypeFix$Var1
,y=cTypeFix$Freq[which(cTypeFix$Var2==1)]
,name="Type"
) %>%
add_trace(x=cPriceFix$Var1
,y=cPriceFix$Freq[which(cPriceFix$Var2==1)]
,name="Price"
) %>%
layout(barmode="stack" , title = "Chosen by Price_Fix" , xaxis = x, yaxis = y)
This graphic visualizes the relation between the attribute and the number of times the individual look at each. The eye tracking data indicates how many times the individual looked at each attribute before making a decision. The biggest difference between the attributes is the range of values on the x-axis. Price ranges from 0 to 6 whereas brand and type are looked at more often with range between 0 and 25.
The following visual overlays two graphs that show how many chocolates were chosen and not chosen at given prices.
cBT <- xtabs(Chosen ~ brand + type , data=data)
chosenPrices <- data$Price[which(data$Chosen==1)]
notChosenPrices <- data$Price[which(data$Chosen==0)]
plot_ly(x=chosenPrices , opacity = 0.6 , type = "histogram" , name="Chosen") %>%
add_trace(x=notChosenPrices , name="Not Chosen") %>%
layout(barmode="overlay")